-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: trim import time ~5% #28227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: trim import time ~5% #28227
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add these modules to the blacklist in ci/code_checks.sh
?
pandas/io/common.py
Outdated
Lazy-import wrapper for stdlib urlopen, as that imports a big chunk of | ||
the stdlib. | ||
""" | ||
from urllib.request import urlopen as _urlopen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention is trailing underscores to prevent clashes.
Or,
import urllib.request
return urllib.request.urlopen(*args, **kwargs)
Hmm so if we only expect a 5% improvement in import time I am actually -1 on this change. I don't think its worth adding runtime logic for a small optimization |
Looks like it checks for e.g. "urllib" as opposed to "urllib.request". Since we also import urllib.parse, we can't exclude that. Excluding "http" works though. |
Some context on that number:
The 5% we get here becomes a bigger deal as those get whittled down |
trimming import time is always good; sure it sometimes makes the code a bit more complex but +1 here |
I'm ok with this, especially on the aggregate benefits, but suspect it will be brittle, as someone could in the future (unaware) undo this optimization, so we only get left with the complexity, and no benefit. Maybe add tests for the content of sys.modules in a seperate CI run? |
Does the code check added not catch that? |
Ah nevermind - I see that is only for the http module not urllib |
Im also thinking more general, imports in imported modules etc. |
We could modify the check. I don't know why I originally wrote it to just check the package. |
edited code check |
Wouldn't it make sense to have a whitelist instead of a blacklist? A blacklist will allow a lot of things that we might not really want when importing pandas. |
Let's leave the idea of a whitelist changeover for a separate discussion. Other than that, I think everything has been addressed here. |
A merge of master should fix CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging later today.
can you rebase |
Rebased +green |
Thanks. |
* PERF: trim import time ~5% with lazy imports
* PERF: trim import time ~5% with lazy imports
by avoiding/lazifying expensive stdlib imports.
Exact timings are tough because repeated runs of
python3 -X importtime -c "import pandas as pd"
are really high variance, but my general read is that this trims about 35 ms out of about 650 ms.